{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial: Accelerated Hyperparameter Tuning For PyTorch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### In this tutorial, we'll show you how to leverage advanced hyperparameter tuning techniques with Tune.\n",
    "\n",
    "<img src=\"tune-arch-simple.png\" alt=\"Tune Logo\" width=\"600\"/>\n",
    "\n",
    "Specifically, we'll leverage ASHA and Bayesian Optimization (via HyperOpt) without modifying your underlying code.\n",
    "\n",
    "Tune is a scalable framework for model training and hyperparameter search with a focus on deep learning and deep reinforcement learning.\n",
    "\n",
    "* **Code**: https://github.com/ray-project/ray/tree/master/python/ray/tune \n",
    "* **Examples**: https://github.com/ray-project/ray/tree/master/python/ray/tune/examples\n",
    "* **Documentation**: http://ray.readthedocs.io/en/latest/tune.html\n",
    "* **Mailing List** https://groups.google.com/forum/#!forum/ray-dev"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## If you are running on Google Colab, uncomment below to install the necessary dependencies \n",
    "## before beginning the exercise.\n",
    "\n",
    "# print(\"Setting up colab environment\")\n",
    "# !pip uninstall -y -q pyarrow\n",
    "# !pip install -q -U ray[tune]\n",
    "# !pip install -q ray[debug]\n",
    "\n",
    "# # A hack to force the runtime to restart, needed to include the above dependencies.\n",
    "# print(\"Done installing! Restarting via forced crash.\")\n",
    "# import os\n",
    "# os._exit(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# # If you are running on Google Colab, please install TensorFlow 2.0 by uncommenting below..\n",
    "\n",
    "# try:\n",
    "#   # %tensorflow_version only exists in Colab.\n",
    "#   %tensorflow_version 2.x\n",
    "# except Exception:\n",
    "#   pass"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 1: PyTorch Imports Code"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow as tf\n",
    "try:\n",
    "    tf.get_logger().setLevel('INFO')\n",
    "except Exception as exc:\n",
    "    print(exc)\n",
    "import warnings\n",
    "warnings.simplefilter(\"ignore\")\n",
    "\n",
    "import numpy as np\n",
    "import torch\n",
    "import torch.optim as optim\n",
    "from torchvision import datasets\n",
    "from ray.tune.examples.mnist_pytorch import train, test, ConvNet, get_data_loaders\n",
    "\n",
    "import ray\n",
    "from ray import tune\n",
    "from ray.tune import track\n",
    "from ray.tune.schedulers import AsyncHyperBandScheduler\n",
    "\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "import matplotlib.style as style\n",
    "style.use(\"ggplot\")\n",
    "\n",
    "datasets.MNIST(\"~/data\", train=True, download=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We import some helper functions above. For example, `train` is simply a for loop over the data loader.\n",
    "\n",
    "```python\n",
    "    def train(model, optimizer, train_loader):\n",
    "        model.train()\n",
    "        for batch_idx, (data, target) in enumerate(train_loader):\n",
    "            if batch_idx * len(data) > EPOCH_SIZE:\n",
    "                return\n",
    "            optimizer.zero_grad()\n",
    "            output = model(data)\n",
    "            loss = F.nll_loss(output, target)\n",
    "            loss.backward()\n",
    "            optimizer.step()\n",
    "```\n",
    "\n",
    "In order to make decisions in the middle of training, we need to let the training function notify Tune. The ``tune.track`` API allows Tune to keep track of current results.\n",
    "\n",
    "**TODO**: Add `tune.track.log(mean_accuracy=acc)` within the training loop. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def train_mnist(config):\n",
    "    model = ConvNet()\n",
    "    train_loader, test_loader = get_data_loaders()\n",
    "\n",
    "    optimizer = optim.SGD(\n",
    "        model.parameters(), lr=config[\"lr\"], momentum=config[\"momentum\"])\n",
    "\n",
    "    for i in range(20):\n",
    "        train(model, optimizer, train_loader)  # Train for 1 epoch\n",
    "        acc = test(model, test_loader)  # Obtain validation accuracy.\n",
    "        # TODO: Add tune.report(mean_accuracy=acc) here\n",
    "        if i % 5 == 0:\n",
    "            torch.save(model, \"./model.pth\") # This saves the model to the trial directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example Trial Run\n",
    "\n",
    "Let's run 1 trial, randomly sampling from a uniform distribution for learning rate and momentum. \n",
    "\n",
    "A \"trial\" is the execution of training using a set of hyperparameters. An **experiment** is a set of trials (i.e., a hyperparameter search).\n",
    "\n",
    "Run the below cell to run Tune. \n",
    "\n",
    "#### This is one random sample and should perform poorly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "np.random.seed(1)\n",
    "search_space = {\n",
    "    \"lr\": tune.loguniform(0.0001, 0.1),\n",
    "    \"momentum\": tune.uniform(0.1, 0.9)\n",
    "}\n",
    "\n",
    "ray.shutdown()  # Restart Ray defensively in case the ray connection is lost. \n",
    "ray.init(log_to_driver=False)\n",
    "\n",
    "\n",
    "analysis = tune.run(\n",
    "    train_mnist, \n",
    "    config=search_space, \n",
    "    verbose=1,\n",
    "    name=\"train_mnist\",  # This is used to specify the logging directory.\n",
    "    stop={\"mean_accuracy\": 0.98}  # This will stop the trial \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Plot the performance of this trial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dfs = analysis.fetch_trial_dataframes()\n",
    "[d.mean_accuracy.plot() for d in dfs.values()]\n",
    "plt.xlabel(\"epoch\"); plt.ylabel(\"Test Accuracy\"); "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 2: Efficient Grid Search with Early Stopping\n",
    "\n",
    "\n",
    "Tune provides a `tune.grid_search` primitive to pass into `tune.run` as follows:\n",
    "```python\n",
    "tune.run(config={\"variable\": tune.grid_search([1, 2, 3])})\n",
    "```\n",
    "\n",
    "From this, Tune will run 3 trials, evaluating each value in the grid search. To specify a multi-dimensional grid search, you can use `tune.grid_search` on multiple variables:\n",
    "\n",
    "\n",
    "```python\n",
    "tune.run(config={\n",
    "    \"variable1\": tune.grid_search([1, 2, 3]),\n",
    "    \"variable2\": tune.grid_search([1, 2, 3]),\n",
    "    \"variable3\": tune.grid_search([1, 2, 3]),\n",
    "    \"variable4\": tune.grid_search([1, 2, 3]),\n",
    "})\n",
    "```\n",
    "\n",
    "This will generate a total $3 * 3 * 3 * 3 = 81$ trials.\n",
    "\n",
    "Below, we will set up a grid search over the \"lr\" and \"momentum\" hyperparameters.\n",
    "\n",
    "#### FAQ: what is \"lr\"?\n",
    "\n",
    "\"lr\" stands for \"learning rate\".\n",
    "Deep learning neural networks are trained using the stochastic gradient descent (SGD) algorithm.\n",
    "\n",
    "SGD estimates the error gradient for the current state of the model using examples from the training dataset, then updates the weights of the model using the back-propagation of errors algorithm, referred to as simply backpropagation.\n",
    "\n",
    "The amount that the weights are updated during training is referred to as the step size or the “learning rate.”\n",
    "\n",
    "(Taken from https://machinelearningmastery.com/understand-the-dynamics-of-learning-rate-on-deep-learning-neural-networks/)\n",
    "\n",
    "####  FAQ: what is \"momentum\"?\n",
    "\n",
    "Momentum is a hyperparameter to the SGD aglorithm.\n",
    "\n",
    "An exponentially weighted average of the prior updates to the weight can be included when the weights are updated. This change to stochastic gradient descent is called “momentum” and adds inertia to the update procedure, causing many past updates in one direction to continue in that direction in the future.\n",
    "\n",
    "(Taken from https://machinelearningmastery.com/learning-rate-for-deep-learning-neural-networks/)\n",
    "\n",
    "### Instructions: \n",
    "**TODO**: Specify a multi-dimensional grid search, gridding over `lr` and `momentum`. Choose 5 values between 0.001 to 0.9 for both values.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# TODO: Specify a multi-dimensional grid search, gridding over lr and momentum. \n",
    "# Choose 5 values between 0.001 to 0.9 for both values.\n",
    "hyperparameter_space = {\n",
    "    \"lr\": \"TODO\"\n",
    "    \"momentum\":  \"TODO\"\n",
    "}\n",
    "\n",
    "assert \"grid_search\" in hyperparameter_space.get(\"lr\") \n",
    "assert \"grid_search\" in hyperparameter_space.get(\"momentum\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Using an early-stopping algorithm\n",
    "\n",
    "An efficient hyperparameter optimization avoids training low-performing trials. This is one of the main inefficiencies of a grid search. \n",
    "\n",
    "In Tune, we can avoid this by using state-of-the-art search algorithms such as Asynchronous Successive Halving Algorithm (ASHA). ASHA is a scalable algorithm for principled early stopping. How does it work? On a high level, it terminates trials that are less promising and allocates more time and resources to more promising trials. \n",
    "\n",
    "    The successive halving algorithm begins with all candidate configurations in the base \"halving\" iteration and proceeds as follows:\n",
    "\n",
    "        1. Uniformly allocate a budget to a set of candidate hyperparameter configurations in a given \"halving\" iteration.\n",
    "        2. Evaluate the performance of all candidate configurations.\n",
    "        3. Promote the top half of candidate configurations to the next \"halving\" iteration.\n",
    "        4. Double the budget per configuration for the next \"halving\" iteration and repeat until one configurations remains. \n",
    "        \n",
    "A textual representation (\"iter\" stands for *'\"halving\" iteration'*):\n",
    "    \n",
    "           | Configurations | Epochs per \n",
    "           | Remaining      | Configuration\n",
    "    ---------------------------------------\n",
    "    iter 1 | 27             | 1\n",
    "    iter 2 | 9              | 3\n",
    "    iter 3 | 3              | 9\n",
    "    iter 4 | 1              | 27\n",
    "\n",
    "(from https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ for the ASHA blog post and paper)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Instructions: \n",
    "**TODO**: Set up ASHA.\n",
    "\n",
    "1) Create an ASHA \"Scheduler\" (ASHA). A scheduler decides which trials to run, stop, or pause. \n",
    "```python\n",
    "from ray.tune.schedulers import ASHAScheduler\n",
    "\n",
    "custom_scheduler = ASHAScheduler(\n",
    "    metric='mean_accuracy',\n",
    "    mode=\"max\",\n",
    "    grace_period=1,\n",
    ")\n",
    "```\n",
    "\n",
    "*Note: Read the documentation on this step at https://ray.readthedocs.io/en/latest/tune-schedulers.html#asynchronous-hyperband or call ``help(tune.schedulers.ASHAScheduler)`` to learn more about the ASHA Scheduler*\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "#### FAQ: How do I debug things in Tune?\n",
    "\n",
    "The `error file` column will show up in the output. Run the below cell with the ``error file`` path to diagnose your issue.\n",
    "\n",
    "```\n",
    "! cat /home/ubuntu/tune_iris/tune_iris_c66e1100_2019-10-09_17-13-24x_swb9xs/error_2019-10-09_17-13-29.txt\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from ray.tune.schedulers import ASHAScheduler\n",
    "\n",
    "ray.shutdown()  # Restart Ray defensively in case the ray connection is lost. \n",
    "ray.init(log_to_driver=False)\n",
    "\n",
    "\n",
    "custom_scheduler = None  # TODO: Add a ASHA as custom scheduler here\n",
    "\n",
    "\n",
    "analysis = tune.run(\n",
    "    train_mnist, \n",
    "    scheduler=custom_scheduler, \n",
    "    config=hyperparameter_space, \n",
    "    verbose=1,\n",
    "    name=\"train_mnist\"  # This is used to specify the logging directory.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Let's plot our results by wall-clock time and epoch. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot by wall-clock time\n",
    "\n",
    "dfs = analysis.fetch_trial_dataframes()\n",
    "# This plots everything on the same plot\n",
    "ax = None\n",
    "for d in dfs.values():\n",
    "    ax = d.plot(\"timestamp\", \"mean_accuracy\", ax=ax, legend=False)\n",
    "plt.xlabel(\"timestamp\"); plt.ylabel(\"Test Accuracy\"); "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Plot by epoch\n",
    "ax = None\n",
    "for d in dfs.values():\n",
    "    ax = d.mean_accuracy.plot(ax=ax, legend=False)\n",
    "plt.xlabel(\"epoch\"); plt.ylabel(\"Test Accuracy\"); "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Exercise 3: Search Algorithms in Tune\n",
    "\n",
    "Tune enables you to scale existing hyperparameter search libraries such as HyperOpt (https://github.com/hyperopt/hyperopt). In this setting, use the external library's hyperparameter space specification instead of Tune's configuration.\n",
    "\n",
    "Search algorithms can limit the number of concurrent hyperparameters are being evaluated. This is necessary because sometimes the external library is more effective when evaluations are sequential.\n",
    "\n",
    "### Instructions\n",
    "**TODO:** Create a HyperOptSearch object by passing in a HyperOpt specific search space. Also enforce that only 1 trials can run concurrently:\n",
    "\n",
    "```python\n",
    "    hyperopt_search = HyperOptSearch(space, max_concurrent=1, metric=\"mean_accuracy\", mode=\"max\")\n",
    "```\n",
    "\n",
    "Then, plug in `HyperOptSearch` into `tune.run`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hyperopt import hp\n",
    "from ray.tune.suggest.hyperopt import HyperOptSearch\n",
    "\n",
    "# This is a HyperOpt specific hyperparameter space configuration.\n",
    "space = {\n",
    "    \"lr\": hp.loguniform(\"lr\", -10, -1),\n",
    "    \"momentum\": hp.uniform(\"momentum\", 0.1, 0.9),\n",
    "}\n",
    "\n",
    "# TODO: Create a HyperOptSearch object by passing in a HyperOpt specific search space.\n",
    "# Also enforce that only 1 trials can run concurrently.\n",
    "hyperopt_search = \"TODO\" # TODO: Change this\n",
    "\n",
    "# We Remove the dir so that we can visualize tensorboard correctly\n",
    "! rm -rf ~/ray_results/search_algorithm \n",
    "analysis = tune.run(\n",
    "    train_mnist, \n",
    "    num_samples=10,  \n",
    "    search_alg=\"TODO\",  #  TODO: Change this\n",
    "    verbose=1,\n",
    "    name=\"search_algorithm\"  # This is used to specify the logging directory.\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Extra - use Tensorboard for results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use TensorBoard to view trial performances. If the graphs do not load, click `Toggle All Runs`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext tensorboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "%tensorboard --logdir ~/ray_results/search_algorithm"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Extra: Using GPUs."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If your machine has a GPU, you can use the `resources_per_trial` argument to specify that the trial should use a GPU. This allows Tune to automatically set the `CUDA_VISIBLE_DEVICES` for the trial and enforce resource isolation (i.e., 1 trial per GPU at a time)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "analysis = tune.run(\n",
    "    train_mnist, \n",
    "    num_samples=10,  \n",
    "    resources_per_trial={\"GPU\": 1},\n",
    "    verbose=1,\n",
    "    name=\"use_gpu\"  # This is used to specify the logging directory.\n",
    ")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
